Knowledge-Based Weak Supervision for Information Extraction of Overlapping Relations
نویسندگان
چکیده
Information extraction (IE) holds the promise of generating a large-scale knowledge base from the Web’s natural language text. Knowledge-based weak supervision, using structured data to heuristically label a training corpus, works towards this goal by enabling the automated learning of a potentially unbounded number of relation extractors. Recently, researchers have developed multiinstance learning algorithms to combat the noisy training data that can come from heuristic labeling, but their models assume relations are disjoint — for example they cannot extract the pair Founded(Jobs, Apple) and CEO-of(Jobs, Apple). This paper presents a novel approach for multi-instance learning with overlapping relations that combines a sentence-level extraction model with a simple, corpus-level component for aggregating the individual facts. We apply our model to learn extractors for NY Times text using weak supervision from Freebase. Experiments show that the approach runs quickly and yields surprising gains in accuracy, at both the aggregate and sentence level.
منابع مشابه
Effectively Creating Weakly Labeled Training Examples via Approximate Domain Knowledge
One of the challenges to information extraction is the requirement of human annotated examples, commonly called gold-standard examples. Many successful approaches alleviate this problem by employing some form of distant supervision, i.e., look into knowledge bases such as Freebase as a source of supervision to create more examples. While this is perfectly reasonable, most distant supervision me...
متن کاملUsing Commonsense Knowledge to Automatically Create (Noisy) Training Examples from Text
One of the challenges to information extraction is the requirement of human annotated examples. Current successful approaches alleviate this problem by employing some form of distant supervision i.e., look into knowledge bases such as Freebase as a source of supervision to create more examples. While this is perfectly reasonable, most distant supervision methods rely on a hand coded background ...
متن کاملDistant Supervision for Relation Extraction Using Tree Kernels
In this paper we define a simple Relation Extraction system based on SVMs using tree kernels and employing a weakly supervised approach, known as Distant Supervision (DS). Our method uses the simple one-versus-all strategy to handle overlapping relations, i.e., defined on the same pair of entities. The DS data is defined over the New York Times corpus by means of Freebase as an external knowled...
متن کاملA Comparison of Weak Supervision methods for Knowledge Base Construction
We present a comparison of weak and distant supervision methods for producing proxy examples for supervised relation extraction. We find that knowledge-based weak supervision tends to outperform popular distance supervision techniques, providing a higher yield of positive examples and more accurate models.
متن کاملOntological Smoothing for Relation Extraction
There is increasing interest in relation extraction, methods that convert natural language text into structured knowledge. The most successful techniques use supervised machine learning to generate extractors from sentences which have been labeled with the arguments of the relations of interest. Unfortunately, these methods require hundreds or thousands of training examples, which are expensive...
متن کامل